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Abstract 
This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We exper- 
imented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification 
pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and 
(ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5 . We 
developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic 
novel ‘“‘Saaq al-Bambuu”. We evaluate effectiveness of these methods by comparing the generated simple sentences to the 
target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 
0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining Arabic-BERT and fastText achieves P 0.97, R 0.97 and F-1 0.97. 


In addition, we report a manual error analysis for these experiments. 
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1. Introduction 


Text Simplification (TS) is a Natural Language Pro- 
cessing (NLP) task that aims to reduce the linguistic 
complexity of the text while maintaining its meaning 


and original information (Saggion, 2017} |Siddharthan, 
2002} |Collados, 2013). |Shardlow (2014) stated that TS 


task may includes lexical or/and syntactic simplifica- 
tion to produce a new equivalent text which conveys 
the same meaning and message with simpler words and 
structure. According to this definition, TS involves text 
transformation with new lexical items and/or rewrit- 
ing sentences to ensure both its readability and un- 
derstandability for the target audience (Bott and Sag- 
[gion, 2011). This definition also suggests that TS could 
be classified as a type of Text Style Transfer (TST), 
where the target style of the generated text is “sim- 
ple” (Jin et al., 2020). Evidence suggests the impor- 
tance of TS involves : (i) its usage in designing and 
simplifying the language curriculum for both second 
language and first language learners, in making text 
easy-to-read for first language early learners; in assist- 
ing first-language users with cognitive impairments and 
low literacy language level; (ii) being a fundamental 
pre-process in NLP applications such as text retrieval, 
extraction, summarization, categorization and transla- 
tion (Saggion, 2017); and (iii) acting as a post-process 
step in Automatic speech recognition. Some scholars 
hold the view that the automation of TS task is very 
difficult (Petersen and Ostendorf, 2007) 
Koller, 2008). In their opinion, the concept of easy-to- 
read texts is not universal as it is nearly impossible to 
simplify a text that meets the needs of all the individ- 
uals with reading and comprehension problems 
HHervis et al, 2014}, However, most 


of the research up to now has been providing promising 


attempt to reach this goal. 


Accordingly, the TS task varies depending on the final 
application or the target audience. Hence, there are var- 
ious types of simplification systems based on the pur- 
pose and who is the end-user of the system. A reason- 
able approach to tackle this issue could be to follow 
a general simplification strategy. There are three key 
aspects of simple text that: (4) it is made up of com- 
mon simple words, simple sentences, and direct lan- 
guage; (ii) unnecessary information is omitted ; (iii) it 
can be shorter by the number of words, but with a large 
number of sentences (Bott and Saggion, 2011] 
|dos, 2013). [Collados (2013) approached TS differently 


as he came up with different opinion, that is a slightly 
simplified text for one user is generally simpler for any 
other users. But a deep simplification for a specific 
user, may lead to a more complex text for another user. 
TS is an active NLP research area, like other ongo- 
ing research, its techniques shows a drift from man- 
ually hand-crafted rules towards deep learning tech- 
niques (Sikka and Mago, 2020} 
Azmi, 2021). Most of these techniques were borrowed 
from closely related NLP tasks such as Machine Trans- 
lation (Sikka and Mago, 2020) . This has influenced us 
to demonstrate the effectiveness of applying two differ- 
ent methods to address the sentence simplification (SS) 
task as follows: 


(1) Classification Approach SS is considered as a 
classification task that requires a decision on which 
word to replace or syntactic structure to regenerate 
in each complex sentence. This approach allows the 
application of the Lexical Simplification task (LS) 
pipeline, i.e that aims to control the readability attribute 
of the text and make it more accessible to different 
readers with various intellectual abilities. LS partic- 


ularly involves word change, thus we experiment the 
effect of different embedding representation on word 
classification decision. This approach highlights the 
impact of how the text is simplified either by applying 
word embedding, or contextualised embedding such as; 


BERT (Devlin et al., 2018). 


(2) Generative Approach SS is considered as a trans- 
lation task, in which the translation is done within 
the same language from a complex sentence as the 
source to a simplified sentence as the target 
lal., 2010). According to this perspective, SS gen- 
erative model could be implemented using Machine 
Translation (MT) and monolingual text-to-text gener- 
ation techniques. Thus, we combined all LS steps into 
one process which learns from the complex sentence 
how to generate the simple version. For this purpose, 
we applied a BERT-like pre-trained transformer to per- 
form a sequence-to-sequence (Seq2Seq) algorithm. 
The main contribution of this paper is to examine dif- 
ferent approaches for Arabic sentence simplification 
task using automatic and manual evaluation. To our 
knowledge, this is the first available Arabic sentence- 
level simplification system|"| 


2. Corpus and Tools 


The corpus used for training is a set of complex/simple 
parallel sentences that have been compiled from 
the internationally acclaimed Arabic novel “Saaq al- 
Bambuu” which has an authorized simplified version 
for students of Arabic as a second language 
iar and Assaf, 2016). Assuming that a successful sen- 
tence simplifier should be able to detect word/sentences 
in the original text that require simplification and sim- 
plify them in such a way as the original simple coun- 
terpart. The dataset consists of 2980 parallel sentences 
as illustrated in Table 1 and classified according to 
The Common European Framework of language profi- 
ciency Reference (CEFR[| .ie is an international stan- 
dard for describing language ability ranging from A1, 
A2... up to C2. 














Levels Sentence Tokens 
Simple A+B | 2980 34447 
Complex C | 2980 46521 
Total 5690 80968 

















Table 1: Number of Sentences and Tokens available per 
each CEFR Level in Saaq al-Bambuu parallel corpus 


We aligned the words in the parallel “Saaq al-Bambuu” 
sentences using Eflomal?| word aligning tool that uses 
a Bayesian model with Markov Chain Monte Carlo 





'To be shared in open-source by the time of publication. 
yhttps://github.com/robertostling/ | 





(MCMC) inference (Ostling and Tiedemann, 2016). 


After aligning the words, we automatically identi- 
fied four basic simplification types on word-level and 
sentence-level (Alva-Manchego et al., 2017), then an- 
notate these types with the following labels : 
¢ Deletions, DELETE (D) in the complex sentence. 
[word-level] 
e Additions, ADD (A) in the simplified sentence. 
[sentence-level] 
¢ Substitutions, REPLACE (R), a word in the com- 
plex sentence is replaced by a new word in the 
simplified sentence. [word-level] 
¢ Rewrites, REWRITE (RW) words shared in both 
complex and simple sentence pairs. [sentence- 
level] 
The overall calculation of the simplification processes 
in the “Saaq al-Bambuu” corpus illustrated in figure 1. 
The REW RITE operation has the highest proportion 
of the simplification processes [keeping the word as it 
is in both versions] in which 21899 words were copied 
in the simplified version. Whereas, 12561 words have 
been deleted to simplify the sentence that annotated 
with DELETE label. In the third position comes 
REPLACE operation in which 9082 words where 
subsisted with their simple counterparts. At last, only 
362 words were added to simplify the sentences that 
annotated with ADD label. 


A= Addition 
1% 






D= Deletion 
28% 


Figure 1: Represents the percentage of each simplifica- 
tion operation on Saaq al-bambuu corpus 


Regarding Part-Of-Speech features (POS-features) ex- 
traction we used MADAMIRA a robust Arabic mor- 
phological analyser and part of speech tagger 
fal, 2014), 


3. Method One - Classification approach 


The reference for this approach is the pipeline of the 
LS task, that focus on LS by replacing complex vo- 
cabularies or phrasal-chunks with suitable substances 
(Paetzold and Specia, 2017b). To reach this goal, we 
decided to implement three classification models: 
1. classification model which is based on word em- 
bedding, thus we applied fastText}*| word embed- 


“https://fasttext.cc/docs/en/ 





ding tool that represents words as vectors em- 
bedding. Those vectors embedding was trained on 
Common Crawl and Wikipedia. Using the Ara- 
bic ar.300.bin file in which each word in WE is 
represented by the 1D vector mapped of 300 at- 
tibutes(Grave etal, 2018) 

2. classification model which is based on transform- 
ers. Using Arabic-BERT}| a pre-trained trans- 
former model on both filtered Arabic Common 
Crawl and a recent dump of Arabic Wikipedia 


contain approximately 8.2 Billion words (Safaya 
et al., 2020) ; 


3. classification model combining both fastText and 
Arabic-BERT results with post-editing rules; 
Considering the definition of the four main steps ap- 

plied in the pipeline for LS as follows: 

Complex word identification [CWI] is the main first 
step performed at the top of the pipeline that employed 
to distinguish complex words from simple words in the 
sentence. Substitution Generation [SG] involves gen- 
erating all possible substitutions but without including 
ambiguous substances that would confuse the system 
in the Substitution Selection step. Substitution Rank- 
ing [SR] is to order the new generated substitution list 
to ease the selection step by giving high probability 
of the most appropriate highly ranked word. Substitu- 
tion Selection [SS] is responsible for selecting from the 
ordered SG’s generated list the most appropriate sub- 
stitute according to the context while preserving the 
same meaning and grammatical structure.Taking into 
account the fact that, a word may have multiple mean- 
ings, and different meanings will have different rel- 
evant substitutions, then the SS task may generate a 
miss-substitution, which may lead to meaning corrup- 
tion. The following part of this paper moves on to de- 
scribe in greater detail the implementation of each step 
concentrating on employed methods and tools. 


3.1. Complex word identification 

CWI step could be viewed as a layered analysis opt for 
a better understanding of word complexity. Hence, we 
applied a lexicon-based approach. Taking into account 
one sentence per time, the first level relates to identi- 
fying the number of syllables per each word in the tar- 
get sentence keeping record of its POS-tag along with 
other features produced by MADAMIRA to be used in 
further steps. The second layer of analysis moved to 
assign each word a CEFR complexity level adopting a 
Lexical based approach using CEFR vocabulary Lis- 
tas a reference to allocate each word in the target sen- 
tence to a readability level. At CWI, with identifying 
the complex words, these words become the targets to 
simplify. Unfortunately, it is impractical to simplify all 
complex words in a sentence at once so that, first by or- 
dering words according to their CEFR level and taking 
into account each of these words as the target per time 
to deploy the simplification process. For example, if 


“https: //huggingface.co/asafaya/ 





a sentence has three complex words assigned with B2, 
C2, Cl, firstly we order them to be C2, C1, and B2 and 
then start the simplification process with targeting C2 
tagged word, followed by Cl and so on. In this exam- 
ple, this operation results in generating three sentences 
each with different masked word slot. 


3.2. Substitution Generation and Ranking 


These two steps were considered in one process using 
different methodologies to generate the substitution list 
and ranking them considering semantic similarity mea- 
sures. For this purpose we obtained different sentence 
embedding to produce ten top ranked substitution list 
of the masked token. 


3.2.1. Arabic-BERT prediction 

Firstly, for each complex word we use Arabic- 
BERT model with applying BERT’s task 
MaskedLanguageModeling (MLM). This task 


predicts a substitution list of a masked [not shown, 
complex] token in a sequence given its left and 
right context. At this process, the MLM requires 
a concatenation between the original sequence and 
the same sentence sequence where the target word is 
replaced by [MASK] token as a sentence pair, and 
feed the sentence pair into the BERT to obtain the 
probability distribution of the possible replacements 
corresponding to the MASK word. To use any pre- 
trained BERT model, we need to convert the input data 
(sentence’s tokens) into an appropriate format so that 
each sentence can be sent to the pre-trained model to 
obtain the corresponding embedding using modules 
and functions available in Hugging Face’s transformers 
package. For this task, first, convert tokens’ vectors to 
PyTorch tensors. Secondly in the next sentence pre- 
diction, the beginning and end of each sentence need 
to be marked before feeding them to the BERT model. 
For this purpose, a general token [CLS] was added as 
a first token to represent the hidden state of the whole 
sentence along with adding another generated token 
[SEP] identifying the end of a sentence. For example, 
any input could be represented by: [CLS] original 
sentence [SEP] sentence with a masked token [SEP], 
in which [CLS] is the beginning of the sentence, the 
first [SEP] a mark for the end of the first sentence and 
the beginning of the following one and, a last [SEP] 
identifying the end of the whole input. By using this 
approach, taking into account not only considering the 
complex word, but also the surrounded context of the 
complex word. For example, given this sentence from 
Arabic Wikipedia: 

Syl Yat hog Kell De ye NHS 
tatatalabu min hay’atu almahkamatu wujitba tahdida 
alhuquq 
[But many situations require the court to determine the 
rights] 

The sentence pair construction before feeding into 
Arabic-BERT is shown in figure [2| Also this figure il- 


lustrate part of the prediction list of the [MASK] word 
“Wujuba” ° & §> 93°, ‘necessity’) applying MLM task 
[BertForMaskedLM] from the hugginface library. 

One of the most noticeable aspect in BERT is sen- 
tence tokenization that is an initial step before convert- 
ing tokens into their corresponding unique IDs [embed- 
ding vector]. There is an important point to highlight 
about BERT-tokenizer algorithm, the common out-of- 
vocabulary (OOV) problem, since the model is pre- 
trained on a specific corpus, the words are limited to 
ones that appeared in this training corpus. As a solu- 
tion, in testing and prediction processes, BERT models 
is designed to replace the unseen tokens with a special 
token [UNK], which stands for unknown token. 
However, converting all unseen tokens into [UNK] will 
take away a lot of information from the input data. 
Hence, the BERT tokenizer adopts the W ordPiece al- 
gorithm that not only splits the sentences into words but 
also breaks out words into several subwords. This split- 
ting technique is represented by the model by adding 
‘“##> as a start for each consecutive word part. The 
BERT tokenization function, on the other hand, will 
first split the word tatatalabu (7 lee *, ‘require’) into 


two subwords, namely [E3’] and [’ Le’), where 
the first token is a more commonly-seen word (prefix) 
in a corpus, and the second token is prefixed by two 
hashes ## to indicate that it is a suffix following some 
other subpart. If there is no way how to split the token 
into subwords, the whole word becomes [UNK]. After 
this tokenization step, all tokens can be converted into 
their corresponding IDs. 
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325 (wujib, necessity) 
323! (alwujb, obligatory) 
a= ( ‘adam,Non) 
345» (Dardrah, necessity) 
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Figure 2: The complex word in this sentence is 
“Wujitba” ° — = 3’, ‘necessity’), to get the simplest 
replaces candidates we will feed it into Arabic-BERT 


3.2.2. fastText prediction 

Using fastText model in two folded processes, first 
ranking the previously produced substitutions obtained 
by MLM BERT. This is done by calculating the seman- 
tic cosine similarity between each word in the produced 
list to the target complex word. The second process 
is using fastText word embedding itself to generate a 
list of possible replacements [SG] and then ranking by 
the nearest neighbour [SR]. For example, the fastText 
generated list given the target complex word in the pre- 


vious example is shown on the left side of the figure 
Whereas, the ranking probability of Arabic-BERT’s 
prediction list using fastText was shown on the right 
side of figure [3] 
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‘ses! ( fa-wujdb, necessity) | 0.8071 





Figure 3: Arabic-BERT and fastText predication lists 
along with the probability obtained from fastText for 


the word “Wujtiba” (’S > 4’, ‘necessity’ ) 


3.3. Substitution Selection 


At this stage, each complex word in the sentence has 
different ordered substituted lists based on Arabic- 
BERT and fastText. Taking into account each predic- 
tion list to analyse individually and select the more log- 
ical substitute based on the probabilities and some lin- 
guistics rules. This allowed the system to generate a 
set of simplified versions of the target sentence. In ad- 
dition, keeping a record of the semantic similarity and 
the readability level of the new produced sentences. 
The system produces three different simple sentences 
based on Arabic-BERT substitute selection, fastText, 
and Combined decision from both generated lists. The 
combined decision is a very crucial stage and the sys- 
tem needs to be careful when selecting the best sub- 
stitute based on different measures. Starting with the 
Arabic-BERT list, the greater the value the most com- 
mon or familiar is the word for a person referring to 
simple words. If the word is tagged with replacement 
with [UNK] the decision is to ignore the results from 
Arabic-BERT and rely on fastText results. Then, ap- 
plying the following four rules to limit incorrect selec- 
tion: 

1. Rulel: if [UNK] is a top-ranked substitute then 

go to fastText results. 
Check if the first substitute is [UNK] in this case 
the system completely ignores BERT results and 
keep the original then rely on FastText results im- 
mediately. 

2. Rule 2: if any word’s lemma in the generated 
list equal the lemma of the original word excludes 
these words from the list. 

Check if the lemmas in the predicted list matches 
the same lemma of the target word. In this case, 
we exclude these words from the potential re- 
placement for the target word and keep only the 
words with a different lemma. These replace- 
ments should also share the same POS and Num- 


ber with the target word. 

3. Rule 3:if the substitute word is more difficult 
than the target word. 
Check the word CEFR level of the new substi- 
tute word. The new word’s CEFR level should be 
equal to or less than the CEFR level of the target 
word. Because sometimes the generated list may 
have a more frequent substitute which is more dif- 
ficult than the original word but more frequent. 

4. Rule 4: check if the new substitute shares the 
meaning. 
The system use this rule as it gives a level of con- 
fidence to the system selection. After the system 
makes the final decision either, keep the target 
word or select the suggested substitute based on 
previous rules. At this stage, comparing both tar- 
get and substitute MADAMIRA English transla- 
tion feature [appeared in Gloss feature]. If both 
words share part or all possible translation this 
gives the system confidence to replace the target 
with the substance. 


4. Method Two: Generative Approach 


Here, we employ a Seq2Seq approach adopting T5 
“Text-to-Text Transfer Ti ransformer’))| T5 is a BERT- 
like transformer that takes input a text and training it 
on the model to generate target text of a different vari- 
ety of NLP text-based tasks such as (summarization, 
translation, question answering and more) 
lal., 2019). The main difference between BERT and T5 
is that BERT uses a Masked Language Model (MLM) 
and an encoder-decoder, although T5 employs a unified 
Seq2Seq framework (Farahani et al., 2021). T5 model 
initially targeted English-Language NLP tasks. Recent 
research extended the model to include more than 101 
languages including the Arabic Language. A “multi- 
lingual Text-to-Text Transfer Transformer”, Multilin- 
gual TS, mT5 (Xue et al., 2020), a new variant of TS 
and pre-trained on Common Crawl-based dataset. The 
pre-trained language model was very successful for the 
Natural Language Understanding (NLU) task. 

Considering the multilingual capabilities of mT5 and 
the suitability of the Seq2Seq format for language gen- 
eration. This gives it the flexibility to perform any 
NLP task without having to modify the model ar- 
chitecture in any way. This experiment employs the 
‘MT5-For-Conditional-Generation’ class that is used 
for language generation. Training a TS model using 
*Saaq al-Bambuu” parallel sentences, over the mT5- 
base mode|”| The system was developed in Python3.8 
environment with using other toolkits such as Natural 
Language Processing Toolkit (NLTK) and Scikit — 


“hnttps://simpletransformers.ai/docs/ 


t5-specifics/ 


google/mt5-base, is available through Huggingface 
repository, https://huggingface.co/google/ 


“http: //www.nltk.org/ 





learrP| Our sentence corpus was randomly split into 
80% for training and 20% for testing. 


5. Evaluation 


Likewise, most TS evaluation approaches have been 
driven from other similar NLP research areas. Vari- 
ous evaluation methods have been applied across re- 
searches to measure the three main aspects of the newly 
generated text. These aspects are, i) fluency, refer- 
ring to the grammatically well-formedness and struc- 
ture simplicity; ii) adequacy, meaning preservation; iii) 
simplicity, more readable. All methods were evalu- 
ated on the same test dataset that consisted of 299 ran- 
domly chosen sentences excluded from training. We 
employed both automatic and manual evaluation com- 
paring both systems. 


5.1. 


BERTScore is an evaluation metric that computes co- 
sine similarity scores using BERT-style embedding 
from a pre-trained transformer model. As such models 
provide a better representation of the linguistic struc- 
ture, BERTScore evaluation correlates better with hu- 
man judgments regarding the measurements of sen- 
tence similarity. BERTScore evaluation metric over- 
come the limitations of the previous Machine trans- 


lation evaluation metrics such as BLEU(Papineni et] 
al., 2002) and SARI(Xu et al., 2016), n-gram based 


evaluation metrics. These methods were not able to 
capture two main simplification features: 1) chang- 
ing word order as paraphrasing simplification method, 
2) maintaining the deep structure meaning, despite 
changes in the surface form structure. Moreover, 
the BERTScore evaluation method gives the option 
to use different pre-trained transformer models by ap- 
plying baseline rescaling to adjust the output scores. 
This allowed determining the performance of differ- 
ent Arabic-language trained BERT models;(i) the de- 
fault in multilingual BERT (mBERT)(Devlin et al.,| 
2018) that is based on the selected language which is 
Arabic in this case; (ii) ARBERT|"| that trained on 
a collection of six Arabic datasets comprising 61GB 


of text (6.2B tokens) oS ee 
(iii) AraBERTv0.2-base iw model consist of 77GB of 
sentences (8.6B tokens) (Antoun et al., 2020). How- 
ever, AraBERT has been trained on a larger corpus 
than ARBERT, the latter uses WordPeice tokeniser as 
illustrated before. Whereas, AraBERT relies on Sen- 
tencePiece tokeniser that uses spaces as word bound- 


aries. Considering these two parameters reflected in 
BERTScore metrics. 


Automatic Evaluation 





https://scikit-learn.org/stable/ 


https://github.com/UBC-NLP/arbert 


https://huggingface.co/aubmindlab/ 
bert—base-arabert 













Classification approach - Automatic Evaluation 
The classification system produced three simple ver- 
sions of the target sentence using BERT-alone, 
fastText-alone, and combined version. This automatic 
evaluation was applied to compare different BERT 
models resolutions of these sentences as represented 
in Table 2. In figure |4| representing the number of 
changes performed by each classification model. These 
primarily results suggests that using fastText-alone per- 
form unneeded simplification resulting in lower F-1. 
Whereas, a higher F-1 measure in Arabic-BERT-alone 
generated sentence suggest that using BERT eliminate 
necessary changes. While the combination of both 
tools suggestions enhances the substitution ranking and 
choice process. That eliminates unnecessary changes 
and enhance performance. In this case, combined pro- 
duced sentences achieved P 0.97, R 0.97 and F-1 0.97 
using ARBERT. 
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Figure 4: number of changed words using fastText- 
alone, Arabic-Bert-alone and combined 


, 4a 


Dee E 


Generative Approach-Automatic Evaluation  Test- 
ing the 299 sentences for evaluating the generated sim- 
plified sequences compared to the original sentences 
and the target simple sentences. Using three measures 
as presented in Table 2: 
¢ Original/Target, considering it as a reference to 
the mT5 system. 
¢ Generated/Original, comparing the newly gener- 
ated sentence with the original complex sentence. 
¢ Generated/Target, comparing the newly generated 
sentence with the target simple sentence. 
To further illustrate these three models’ performance, 
figure represents the distribution of F-1 across 
the testing data instances using different BERT mod- 
els. The default model F-1 plots skewed to- 
wards the right reflecting strong similarity across the 
three parallel sentences (Original/Target/Generated). 
Whereas, AraBERT plots Original/Target and Gener- 
ated/Original skewed to the left indicating less similar- 
ity across the data. While, ARBERT’s plots represent a 
normal distribution representing a more accurate simi- 
larity measure in the data. This findings suggests AR- 
BERT that applying a WordPeice sentence tokeniser 


BERT model performed better in sentence representa- 
tion. 
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Figure 5: The Fl scores for each sentence pair, the 
scores are more spread out, which makes it easy to 
compare different methods 


5.2. Manual Evaluation 


Classification Approach - Manual Evaluation a man- 
ual analysis of the produced sentences of combined 
system has been performed. The results displayed in 
figurd6] on a scale of good, useful, a bit useful, and 
useless simplification. 55% of the new simplified sen- 
tences were either good, useful or a bit useful as a 
majority. While 45% of the sentences were classi- 
fied as useless simplification where the complex word 
was replaced either by a more complex word or its 
antonym. For example, a useful simplification from 
the combined system as in this sentence from ”Saaq 
al-Bambuw’, . , 

OKA chs 3KG really Gaal 9 Gael Ear 
Kuntu ’uhaddiqu fi altabaqi wa-al-samtu yakadu yab- 
tali‘ al-makan. 

[I was staring at the plate and the silence almost swal- 
lowed up the place. ] 

In this sentence, the word Usa (Uhaddiqu, staring’) 
was replaced by ul ( ‘ata’mmalu, ‘muse’), that is 
more frequent and simpler and generate: 

OKA plbs 3K mally Gl G fab Ear 
Although, it is simpler it doesn’t reach thé’exact target 
word Bil ( ‘Anzurtu, ‘look’) 

Generative Approach-Manual Evaluation despite the 
initial automatic evaluation provided promising results, 
the manual evaluation of the generated text provides 
deeper insight into mT5’s output for the Arabic simpli- 
fication task. According to the manual error analysis as 
shown in figure[/only 31 sentences were correctly sim- 
plified from 299 testing instances. In addition, about 
120 generated sentences were incomplete and the sys- 
tem produced 64 meaningless or ill-formed sentences. 
A significant shortcoming that the produced sentences 
tends to have the same repeated phrase. Moreover, one 



























































Classification P R F1 Generation P R Fl 
Default mBert Default mBert 
Target/fastText 0.962 | 0.966 | 0.964 Original/Target 0.889 | 0.838 | 0.862 
Target /BERT 0.991 | 0.990 | 0.990 Generated/Original | 0.806 | 0.725 | 0.762 
Target /Combined | 0.974 | 0.975 | 0.975 Generated/ Target 0.754 | 0.723 | 0.736 
ARBERT ARBERT 
Target/fastText 0.958 | 0.960 | 0.959 Original/Target 0.840 | 0.754 | 0.790 
Target /BERT 0.990 | 0.991 | 0.990 Generated/Original | 0.647 | 0.529 | 0.573 
Target /Combined | 0.976 | 0.976 | 0.978 Generated/ Target | 0.570 | 0.524 | 0.538 
AraBERT AraBERT 
Target/fastText 0.962 | 0.963 | 0.963 Original/Target 0.879 | 0.823 | 0.848 
Target /BERT 0.989 | 0.989 | 0.989 Generated/Original | 0.787 | 0.693 | 0.734 
Target /Combined | 0.975 | 0.976 | 0.976 Generated/ Target 0.723 | 0.686 | 0.701 

















Table 2: Precision, recall and F1 measures using BERTScore with different transformer models 





Figure 6: Simplified sentences analysis based on the 
usefulness of the lexical substitution processes. 


of the generated sentences were more complex than the 
original sentence. Also, the unexpected errors were 
producing simple sentences, yet with different or op- 
posite meanings. As such example giving an opposite 
meaning in the generated sentence: 

2 


Ve i58 ol db 2 LK ge 
ilagti bil-Kanisatu fi bilad ’ummt qawtyyah jiddan 
That means, [My relationship with the Church in my 


mother’s country is very strong.] This simple version 
contradict the meaning of the original sentence. 
2 


sl mk GF ESE dle ed aR 
We HW ght 
Laysa hunak ma yumayyizu ilagati bil-Kanisatu fi bilad 
‘ummi, faziyarati laha qalilah jiddan [There is nothing 
that distinguishes my relationship with the Church in 
my mother’s country. My visits to it were very few.] 
In this sentence, instead of mentioning that his relation- 
ship was not strong, the system generated the opposite 
meaning by expressing a very strong relationship. 
Otherwise, mT5 in some cases can produce a perfectly 


valid paraphrase, which is better than the target simple 
sentence. 


AIL dl sighs 2 git & lb 
talab minna al-julias fi salwnahu almaly bi-al-kutub 
{He asked us to sit in his salon full of books.] 

hs - _ “<4 
Fi sdlunahu al-saghir almali bil-kutubi, talaba minna 
al-juliisi ’amama maktabi saghiri 
{In his small salon full of books, he asked us to sit in 
front of a small desk.] In this case, the generated sen- 


tence was syntactically simpler than the target while 
focusing on the main information. 


w Number of Sentences 





Figure 7: Manual error analysis distribution across test- 
ing data 


6. Related Works 
Blum and Levenston (1978) completed one of the first 


studies that introduce Lexical simplification for Teach- 
ing English as a Second Language (TESOL). Some 
of the following TS systems applied a rule-based ap- 
proach 
(2014). Most later carried out studies based on a mono- 
lingual parallel-aligned corpus of original and simpli- 
fied texts by applying different machine-learning algo- 


rithms such as |Aluisio et al. (2008) and 
(2009) for Portuguese language, |Collados (2013) for 


Spanish language and |GlavaS and Stajner (2015) for 


English. Other researchers considered the TS problem 
as a monolingual translation problem that is best solved 
through applying the Statistical Machine Translation 
(SMT) framework (Specia, 2010} |Zhu et al., 2010 


Woodsend and Lapata, 2011} |Wubben et al., 2012). 
Latest English TS studies start applying word embed- 


ding(Paetzold and Specia, 2016} |Paetzold and Specia, 
2017a) and BERT transformers for lexical simplifica- 


tion as presented in|Qiang et al. (2020) proving its ef- 
fectiveness in solving LS task. 

Unlike English and Other Latin languages, only a few 
researchers have been tackling the problems of Arabic 
ATS. 


Al-Subaihin and Al-Khalifa (2011) a prototype un- 


released system at King Saud University, they proposed 
Arabic Automatic Text simplification system (AATS) 
called Al-basset. The system architecture for AATS 
structured in the light of the state of the art of sys- 
tems for other languages. Such as SYSTAR, a syntactic 
simplification system for the English aphasic or inartic- 
ulate population(Carroll et al., 1998). Another system, 
SIMPLIFICA, is a simplification tool for Brazilian Por- 
tuguese (BP) targeting those with low literacy levels 
(Scarton et al., 2010). The design of ”Al-Basset” was 
constructed of four main stages: 1) measuring complex- 
ity, in this stage they would adopt a statistical language 
model based on a machine learning technique called 
ARABILITY (Al-Khalifa and Al-Ajlan, 2010); ii) vo- 
cabulary (lexical) simplification by following the LS- 
pipeline and produce the synonyms either by building 


a new dictionary or using Arabic-WordNet(Rodriguez| 
while select the most common and possi- 
ble synonym, by using the Google API; iii) syntactic 
simplification, they suggested identifying the complex 
structures by applying a look-up approach to a man- 
ually predefined list of Arabic complex structures; iv) 
diacratization using MADA(Habash et al., 2009) dia- 
critizer task. The main limitation of implementing this 
system at this point is the unavailability of Arabic basic 
resources and tools. Such as dictionaries, corpora and 
parallel complex-simple structures which are the main 
components of any ATS system. 


Al Khalil et al. (2017) provided the second attempt 


to build an AATS system at New York University in 
Abu-Dhabi. Their simplification system was designed 
to be semi-automatic to simplify Arabic modern fic- 
tion; it involved a linguist using a web-based applica- 
tion to apply ACTFL (American Council on the Teach- 
ing of Foreign Languages) language proficiency guide- 
lines for simplification of five Arabic novels. They 
aimed to provide essential Arabic resources for build- 
ing ATS and formulating manual simplification rules 
for Arabic fiction novels using TS stat-of-the-art. The 
first resource they expected to produce is a corpus 
consisting of 1M tokens of the 12-grade curriculum, 
5M tokens of the adult novels (original and simplified 
counterparts), and 500K tokens of children’s stories. 


Also, they provided a proposal to the SAMER (Simpli- 
fication of Arabic Masterpieces for extensive reading) 
project based on the corpus analysis. Their guidelines 
invoke both the MADAMIRA and 
CAMAL dependency parser (Shahrour et al., 2016) for 
data analysis and classification of their corpus. They 
were aiming to build a readability measurement identi- 
fier to formulate a 4-levelled graded reader scale (GRS) 
by applying various machine-learning classifiers. 

It should be noted that, neither of them followed up 
with any further application of their success or failure. 


7. Conclusion 


In this paper, we have presented the first Modern Stan- 
dard Arabic sentence simplification system by applying 
both classification and generative approaches. On the 
one hand, the classification approach focuses on lexi- 
cal simplification. We looked at the different classifi- 
cation methods and we found that a combined method 
generates well-formed simple sentences. In addition, 
using word embeddings and transformers prove to pro- 
duce a reasonable set of substitutions for the complex 
word more accurately than traditional methods such 
as WordNet. Our interpretation of the limitation of 
the classification system arises from the fact that some 
of the generated sentence structure was negatively af- 
fected and that the system misidentifies some complex 
words in the CWI step. Even though this limitation re- 
veals the ineffectiveness of the Arabic CEFR vocabu- 
lary list in identifying the complex word, the list fulfils 
the substitution replacement step. On the other hand, 
while the generative Seq2Seq approach provides a less 
accurate simplified version in most cases, in some cases 
it outperforms all other approaches by generating a bet- 
ter simplified sentence than the target human simple 
sentence. Nevertheless, one of the limitations of the 
generative approach can be the repetition of a part of 
the same phrase patterns. Future research will address 
this issue. 

Overall, in conclusion, there are advantages and limita- 
tions in the two approaches, both of which could benefit 
from building a larger parallel simple/complex Arabic 
corpus. Moreover, adding a post-handler language gen- 
eration module could resolve some of the limitations 
even if only acting as a less accurate alternative fast 
solution, for example, by avoiding and removing same 
repeated phrase patterns produced from the generative 
system. Even though Arabic Text Simplification task 
is a very challenging process, this research has huge 
potential towards achieving better developed system. 
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